home *** CD-ROM | disk | FTP | other *** search
Text File | 1991-09-24 | 33.5 KB | 1,314 lines |
- .de Bs \" Begin subsection
- .RT
- .sp \\n(PDu
- .ne 1.1
- .ll -\\n(QIu
- .in +\\n(QIu
- .ls 1
- .B \\$1
- \\&\\$2
- .ti +\\n(PIu
- ..
- .de Es \" End subsection
- .ls
- .in -\\n(QIu
- .ll +\\n(QIu
- ..
- .de Bi \" Begin Illustration
- .DS B
- .ls 1
- .cs R 25
- .cs I 25
- .cs B 25
- .ss 25
- .tr -\-
- .lc \-
- ..
- .de Ei \"End Illustration
- .ls
- .DE
- .ss 0
- .cs R
- .cs I
- .cs B
- .tr --
- .lc .
- ..
- .if n .ds n1 *
- .if t .ds n1 \(dg
- .if n .ds n2 **
- .if t .ds n2 \(dd
- .lg 0
- .DA August 2, 1982
- .TL
- Description of a
- .UX
- \fI\kxa.out\h'|\nxu+2u'a.out\fP
- File
- .AU
- Matt Bishop\*(n2
- .AI
- Megatest Corporation
- 2900 Patrick Henry Drive
- Santa Clara, CA 95050
- .AB
- This document describes the format
- of executable files which run under
- the Fourth Berkeley Distribution of
- .UX ,
- Release 4.1.
- .AE
- .FS
- \*(n2Author's present address:
- Department of Computer Sciences,
- Purdue University,
- West Lafayette, IN 47907
- .FE
- .SH
- .ls 2
- Introduction
- .PP
- This document describes the format of
- .I a.out
- and object files.
- An
- .I a.out
- file is the executable output from the loader;
- an object file, the name of which usually ends
- in \fI.o\fP, is the relocatable input to the loader.
- The format described here is that of the
- Berkeley Software Distribution, Release 4.1
- (better known as ``4.1BSD'');
- be warned that
- there are substantial differences between this format
- and that of the Bell Laboratories Version 7
- (and Version 32/V) format.
- .PP
- These notes were written to help those working
- on symbolic debuggers like
- .I adb (1)
- or
- .I sdb (1),
- or disassemblers for object and
- .I a.out
- files like the
- .I unas (1)
- series of disassemblers
- or those who want to learn more about how
- .UX
- works.
- .PP
- .UX
- provides no convention for determining
- when numbers are to be written as hexadecimal,
- as decimal, and as octal;
- hence,
- those with a leading `0' are octal;
- those with a leading `0x' are hexadecimal;
- and all others are decimal.
- .SH
- Overview
- .PP
- .I A.out
- and object files both use the same format.
- There are five parts to these files:
- .DS B
- .ls 1
- .ta \w'0. 'u
- 1. the header
- 2. the program and data segments
- 3. relocation information
- 4. the symbol table
- 5. the string table
- .ls
- .DE
- These are arranged in the above order.
- (Note that the text and data segment
- includes the uninitialized data, also
- called the
- .I bss .
- These follow the initialized data and are set to 0.)
- Each of these will be discussed in the following sections.
- After that, a sample object file will be analyzed in
- detail to illustrate how all that has been discussed actually works.
- .SH
- Header
- .PP
- The header contains information needed to load the program.
- Specifically, its format is:
- .DS B
- .ls 1
- .ta \w' relocation size'u+6m
- magic number (4 bytes)
- text segment size (4 bytes)
- data segment size (4 bytes)
- bss segment size (4 bytes)
- symbol table size (4 bytes)
- entry point (4 bytes)
- text relocation size (4 bytes)
- data relocation size (4 bytes)
- .ls
- .DE
- .PP
- The magic number is a number which indicates
- how the program is to be executed.
- There are three possible values:
- .Bs OMAGIC "(value 0407)"
- This is the oldest kind of executable program.
- The text segment is neither write-protected nor shared;
- hence, there is no padding between it and the data segment.
- The text immediately follows the header.
- .Es
- .Bs NMAGIC "(value 0410)"
- With this magic number, the text segment is write-protected and
- shared between all processes executing the file.
- As a result, the data segment begins at the first
- block following the text segment.
- .Es
- .Bs ZMAGIC "(value 0413)"
- This is very much like the NMAGIC magic number;
- the only difference is that the text segment
- begins at the block following the block containing the header;
- the space between the end of the header
- and the beginning of the text segment is padded with zeroes.
- The sizes of both the text and data segments are multiples of the block size.
- In this format, the pages of the file are not pre-loaded but
- are loaded into the image as needed.
- .Es
- .LP
- These differences are summarized in figure 1, below.
- .Bi
- .ta \w'+----------------+ 'u +\w'+--------------+ 'u
- .if t .ds In \h'\w' 'u'information\h'\w' 'u'
- .if n .ds In information\
- .sp
- +----------------+ +----------------+ +----------------+
- | OMAGIC header | | NMAGIC header | | ZMAGIC header |
- +----------------+ +----------------+ +----------------+
- | text segment | | text segment | | padding to end |
- +----------------+ +----------------+ | of block |
- | data segment | | padding to end | +----------------+
- +----------------+ | of block | | text segment |
- | relocation | +----------------+ +----------------+
- | \*(In | | data segment | | padding to end |
- +----------------+ +----------------+ | of block |
- | symbol table | | relocation | +----------------+
- +----------------+ | \*(In | | data segment |
- | string table | +----------------+ +----------------+
- +----------------+ | symbol table | | padding to end |
- +----------------+ | of block |
- | string table | +----------------+
- +----------------+ | relocation |
- | \*(In |
- +----------------+
- | symbol table |
- +----------------+
- | string table |
- +----------------+
- .sp
- FIGURE 1. A Comparison of the Types of Executable Files
- .sp
- .Ei
- .PP
- The
- .I "entry point"
- is the point at which execution begins.
- The value stored is the point
- .I "in core"
- (not in the
- .I a.out
- file) where the process will start.
- To find the equivalent point in the
- .I a.out
- file, count from the beginning of
- the text segment rather than from the beginning of the file.
- .PP
- The header is not loaded when the file is executed.
- .PP
- The file
- .I /usr/include/a.out.h
- provides some useful macros for dealing with
- .I a.out
- file formats.
- The header structure is defined there as
- .DS B
- .ls 1
- .ta \w'#define'u +\w'unsigned 'u +\w'a_dirsize 'u +4n
- struct exec {
- long a_magic; /* magic number */
- unsigned a_text; /* size of text segment */
- unsigned a_data; /* size of initialized data */
- unsigned a_bss; /* size of uninitialized data */
- unsigned a_syms; /* size of symbol table */
- unsigned a_entry; /* entry point */
- unsigned a_trsize; /* size of text relocation */
- unsigned a_drsize; /* size of data relocation */
- };
- .ls
- .DE
- Also, the macro
- .B N_TXTOFF
- will return the offset into the file at
- which the text segment begins.
- Macros to locate the symbol and string tables (\c
- .B N_SYMOFF
- and
- .B N_STROFF ,
- respectively) are also given (see figure 2).
- Finally, it also provides another macro (\c
- .B N_BADMAG )
- that checks for an illegal magic number.
- .Bi
- .if t .ds He \h'\w' 'u'header\h'\w' 'u'
- .if n .ds He header\
- .if t .ds Ar " \(->
- .if n .ds Ar }
- .ds Dn \v'0.2v'
- .ds Up \v'-0.2v'
- .ta \w'\fBN_TXTOFF\fP> 'u +\w'+-----------------------+ 'u
- .sp
- macro size of segment
- +-----------------------+
- | | \*(Dn\(lt\*(Up 1024 bytes (\fBZMAGIC\fP)
- | \*(He | \(lk or
- | | \*(Up\(lb\*(Dn 32 bytes (the others)
- \fBN_TXTOFF\fP\*(Ar +-----------------------+
- | |
- | text segment | { \fIa_text\fP
- | |
- +-----------------------+
- | |
- | data segment | { \fIa_data\fP
- | |
- +-----------------------+
- | |
- | text relocation | { \fIa_trsize\fP
- | |
- +-----------------------+
- | |
- | data relocation | { \fIa_drsize\fP
- | |
- \fBN_SYMOFF\fP\*(Ar +-----------------------+
- | |
- | symbol table | { \fIa_syms\fP
- | |
- \fBN_STROFF\fP\*(Ar +-----------------------+
- | | \*(Dn\(lt\*(Up string table size is
- | string table | \(lk given by the first
- | | \*(Up\(lb\*(Dn 4 bytes in the table
- +-----------------------+
- .sp
- FIGURE 2. Structure of an \fIa.out\fP File
- .sp
- .Ei
- .SH
- Text and Data Section
- .PP
- The
- .I text
- segment is the segment containing the machine code to be performed.
- In an object file, it contains a translation of the higher-level
- code in the corresponding object file;
- in an
- .I a.out
- file, it is the code in all the object files,
- preceded by the startup routine (which
- sets up the
- .I argc ,
- .I argv ,
- and
- .I environ
- pointers and then calls the procedure
- .I main() )
- and followed by all the library routines referenced in the program.
- .PP
- In object files, references may be made to variables, procedures, or
- labels not defined in that file (as, for example, using the C
- .I extern
- keyword).
- These are called
- .I "relocatable references" ;
- the text and data relocation segments record for which
- addresses this has happened and additional information to aid
- in the relocation (see that section for more details.)
- .PP
- The
- .I data
- segment provides space for both the
- initialized and uninitialized data;
- the uninitialized data is stored as 0.
- The stack grows downwards from
- the highest possible memory location
- (on the
- .SM
- VAX\c
- .LG
- , 0x7ffff400), and is automatically extended as needed.
- To extend the data segment, however, the
- .I brk (2)
- or
- .I sbrk (2)
- routine must be called.
- Again, if reference is made to an undefined symbol,
- the fact that that byte is (or those bytes are)
- relocatable is stored in the data relocation segment.
- .PP
- As indicated above, if the magic number is
- .B ZMAGIC ,
- the text segment begins on the first block after
- the block containing the header,
- and the sizes of the data and text segments
- are multiples of the block length.
- If the magic number is either
- .B ZMAGIC
- or
- .B NMAGIC ,
- the data segment begins at the first
- block boundary following the text segment.
- .SH
- Text and Data Relocation Segments
- .PP
- These segments tell the loader how to relocate
- bytes in the text and data segments.
- Each relocation datum uses the structure
- .DS B
- .ls 1
- .ta \w'#define'u +\w'unsigned 'u +\w'r_symbolnum:24, 'u +4n
- struct relocation_info {
- int r_address; /* address to relocate */
- unsigned r_symbolnum:24, /* symbol ordinal */
- r_pcrel:1, /* pc relative already */
- r_length:2, /* 0=byte,1=word,2=long */
- r_extern:1, /* sym val not included */
- :4; /* nothing, yet */
- };
- .ls
- .DE
- .PP
- Two pieces of information are used to relocate something.
- The first is the contents of the location being relocated, and
- the second is the relocation datum in the relocation segment.
- When used, the first is the offset of that location from the
- beginning of the segment with respect to which it is relocated.
- In the relocation datum, the field
- .I r_address
- gives the address to be relocated;
- it is relative to the appropriate segment.
- The field
- .I r_symbolnum
- gives the number of the symbol table entry
- of the symbol which is being relocated.
- If it is relocated relative to the program counter,
- the bit
- .I r_pcrel
- is set;
- if the symbol's value is to be added to
- the contents of the location being relocated,
- the bit
- .I r_extern
- Finally, the
- .I r_length
- field gives the type of the datum
- (a byte, a word, or a longword) to be relocated.
- The second longword of the relocation datum is filled by
- the last four bit field.
- This structure is illustrated in figure 3.
- .Bi
- .ta \w'+------------+ 'u
- .if t .ds Rl \|relocated\|
- .if n .ds Rl relocated\
- .sp
- symbol table relocation datum
- +------------+ +---------------------------------------+
- | symbol | | r_address |
- | to be | +-----------+-------+--------+--------+-+
- | \*(Rl<-r_symbolnum|r_pcrel|r_length|r_extern| |
- +------------+ +-----------+-------+--------+--------+-+
- .sp
- FIGURE 3. Picture of a Relocation Datum
- .sp
- .Ei
- .PP
- See the example, below, for a very detailed example of
- how the loader uses this information to relocate an address.
- .SH
- Symbol and String Tables
- .PP
- The contents of the symbol table varies,
- depending on the flags given to the compilers,
- such as
- .I cc (1),
- .I f77 (1),
- or
- .I pc (1),
- and the loader
- .I ld (1).
- The compiler flag that produces the most information is
- .I g ;
- without it, the usefulness of the debuggers
- .I adb (1)
- and
- .I sdb (1)
- is minimal.
- This section describes the symbol table,
- and in particular the format of the information
- placed there by the
- .I g
- flag.
- .PP
- The string table begins with an integer
- (four bytes) which gives its length.
- The names of the variables, each null-terminated, follow;
- every symbol table entry contains an index into this table.
- The string table exists because the 4.1BSD C compiler allows
- variables of arbitrary length;
- thus, it is not possible to reserve space in advance within the
- symbol table entry itself for the variable name
- (as is done in Version 7 symbol table format).
- A picture of all this is given in figure 4.)
- .Bi
- .ta +\w'+---------------------+ 'u +\w' 'u
- .sp
- symbol table entry string table
- +---------------------+ +------------+
- | n_un (name pointer)+ | _str1\e0 |
- +------+-------+------+ | +------------+
- |n_type|n_other|n_desc| +> _str2\e0 |
- +------+-------+------+ +------------+
- | n_value | | _str3\e0 |
- +---------------------+ +------------+
- .sp
- FIGURE 4. Symbol Table Entry and Associated Name
- .sp
- .Ei
- .PP
- The structure of a symbol table entry is
- .DS B
- .ls 1
- .ta \w'#define 'u +\w'char'u-1u +\w'unsigned 'u+1u +\w'*n_name 'u
- struct nlist {
- union {
- char *n_name; /* for use when in-core */
- long n_strx; /* index in file table */
- } n_un;
- unsigned char n_type; /* type flag (eg N_TEXT) */
- char n_other; /* unused */
- short n_desc; /* more detailed info */
- unsigned n_value; /* value (or sdb offset) */
- };
- .ls
- .DE
- Note the union;
- it can contain either the offset of the variable name
- from the start of the string table
- (this includes the leading four bytes of the table;
- hence, the first name has offset 4),
- or a pointer to the variable name.
- The first is most useful when the string table is not read
- into memory;
- however, in most applications it is easier to read it in,
- and then the union value is merely changed to be the
- in-core address of the variable name.
- .PP
- There are four other parts to a symbol table entry;
- the contents of each part depends on what the type of the symbol is.
- This is recorded in the
- .I n_type
- field.
- .PP
- The
- .I n_type
- field,
- as its name suggests, contains the type of the symbol.
- It also contains the segment into which the symbol is placed.
- Possible values are:
- .DS B
- .ls 1
- .ta \w'N_COMM 'u +\w'(0x00) 'u
- N_UNDF (0x00) not associated with any segment
- N_EXT (0x01) defined externally
- N_ABS (0x02) absolute address (value) given
- N_TEXT (0x04) located in the text segment
- N_DATA (0x06) located in the initialized data segment
- N_BSS (0x08) located in the uninitialized data segment
- N_COMM (0x12) names a common region (internal to loader)
- N_FN (0x1f) names a file
- .ls
- .DE
- Note that there are three ``miscellaneous'' possibilities:
- if the symbol is a file name,
- .I n_type
- is set to
- .B N_FN ;
- if it names a common region,
- .I n_type
- is set to
- .B N_COMM
- (this value set, and used, only by the loader);
- and if symbol is external, the mask
- .B N_EXT
- is or'ed in.
- .PP
- As indicated above, by using the
- .I g
- flag when compiling and loading, it is
- possible to get more detailed type declarations.
- What the other fields of the symbol table entry mean
- depends on the value of
- .I n_type .
- Values defined in 4.1BSD are:
- .Bs N_GSYM "(value 0x20)"
- The symbol is a global symbol.
- The
- .I n_other
- and
- .I n_value
- fields are 0;
- the
- .I n_un
- field names the symbol;
- the
- .I n_desc
- field gives a more detailed type description.
- .Es
- .Bs N_FNAME "(value 0x22)"
- The symbol is a
- .SM
- FORTRAN-77
- .LG
- procedure name.
- The
- .I n_other
- field is 0;
- the
- .I n_un
- field names the procedure;
- the other fields are meaningless.
- .Es
- .Bs N_FUN "(value 0x24)"
- The symbol is the name of a procedure.
- The
- .I n_un
- field names the procedure;
- the
- .I n_other
- field is 0;
- the
- .I n_desc
- field contains the source file line number where it begins,
- and the
- .I n_value
- field contains the address in the object file
- where the procedure begins.
- .Es
- .Bs N_STSYM "(value 0x26)"
- The symbol is the name of a static symbol.
- The
- .I n_un
- field names the symbol;
- the
- .I n_other
- field is 0;
- the
- .I n_value
- field gives the variable's address,
- and the
- .I n_desc
- field contains a more detailed type description.
- .Es
- .Bs N_LCSYM "(value 0x28)"
- The symbol is a symbol local to the file and
- allocated in the uninitialized data segment.
- The symbol table parts have the same meaning as for
- .B N_STSYM .
- .Es
- .Bs N_PC "(value 0x30)"
- This is a type produced by the Berkeley
- .I pc (1)
- compiler.
- It is used to check types across separately compiled files.
- The
- .I n_un
- field contains the name of the symbol, the
- .I n_other
- field is 0, the
- .I n_value
- field contains the line number on
- which it occurs, and the
- .I n_desc
- field contains type information
- Values of the
- .I n_desc
- field are:
- .DS B
- .ls 1
- .ta \w'0. 'u +\w'included file name 'u +\w'06. 'u
- .Rt
- 1. source file name \06. global variable
- 2. included file name \07. global function
- 3. global label \08. global procedure
- 4. global constant \09. external function
- 5. global type \o'\01'0. external procedure
- .ls
- .DE
- .Es
- .Bs N_RSYM "(value 0x40)"
- The symbol is one allocated to a register;
- the symbol table parts have the same meaning as for
- .B N_STSYM ,
- except that
- .I n_value
- is the register number, not the address.
- .Es
- .Bs N_SLINE "(value 0x44)"
- This marks the location where a new source line begins.
- Both the
- .I n_un
- and
- .I n_other
- fields are 0,
- .I n_desc
- contains the source line number, and
- .I n_value
- contains the address corresponding to the beginning of the line.
- .Es
- .Bs N_SSYM "(value 0x60)"
- This holds information about an element of a structure.
- The symbol table entry is interpreted just as that of
- .B N_STSYM ,
- except that
- .I n_value
- is the offset to be added to the variable's address to obtain
- the address of this structure element for that variable.
- .Es
- .Bs N_SO "(value 0x64)"
- The name of a source file (\c
- .I not
- an include file) has its
- .I n_type
- field set to this value.
- The
- .I n_un
- field is the name of the source file,
- .I n_other
- and
- .I n_desc
- are 0, and
- .I n_value
- is the address corresponding to the beginning of the source file.
- .Es
- .Bs N_LSYM "(value 0x80)"
- Local symbols have this as the value of the
- .I n_type
- field.
- The
- .I n_un ,
- .I n_other ,
- and
- .I n_desc
- fields are the same as for
- .B N_STSYM ,
- and the
- .I n_value
- field is an offset into the stack
- (recall that C puts local variables on the system stack).
- .Es
- .Bs N_SOL "(value 0x84)"
- This indicates that the symbol represents a file which
- is included by an ``#include'' statement.
- Its symbol table interpretation is identical to that of
- .B N_SO .
- .Es
- .Bs N_PSYM "(value 0xa0)"
- A symbol with this entry is a subroutine, procedure, or function parameter.
- Since parameters are pushed onto the system stack,
- these entries are interpreted just like those of
- .B N_LSYM .
- .Es
- .Bs N_ENTRY "(value 0xa4)"
- This represents an alternate entry point;
- its interpretation is just like that of
- .B N_ENTRY .
- .Es
- .Bs N_LBRAC "(value 0xc0)"
- This marks the occurrence of a left curly brace (`{').
- The
- .I n_un
- and
- .I n_other
- fields are 0, the
- .I n_value
- field contains the address corresponding to the left bracket,
- and the
- .I n_desc
- field contains the nesting level.
- .Es
- .Bs N_RBRAC "(value 0xe0)"
- This marks the occurrence of a right curly brace (`}').
- All fields are interpreted as for
- .B N_LBRAC .
- .Es
- .Bs N_BCOMM "(value 0xe2)"
- This signals the beginning of common.
- Only the name of the common is given
- (in the
- .I n_un
- field);
- the rest of the entry is meaningless.
- .Es
- .Bs N_ECOMM "(value 0xe4)"
- This signals the end of common.
- As with
- .B N_BCOMM ,
- only the name of the common is given
- (in the
- .I n_un
- field);
- the rest of the entry is meaningless.
- .Es
- .Bs N_ECOML "(value 0xe8)"
- This signals the end of a common block local to the routine using it;
- all fields except
- .I n_value ,
- which contains the address of that reference, are meaningless.
- .Es
- .Bs N_LENG "(value 0xfe)"
- This entry contains the number of bytes to allocate to a variable.
- The variable name is in the
- .I n_name
- field, the
- .I n_desc
- field is 0, and the variable's length in bytes in the
- .I n_value
- field.
- The
- .I n_other
- field is 1 to indicate this is a length entry.
- .Es
- .PP
- Most of the above formats use
- .I n_desc
- to describe the type further.
- In these cases, that field is divided into seven parts,
- as described by the structure
- .I desc :
- .DS B
- .ls 1
- .ta \w'#define\ 'u +\w'unsigned\ \ 'u +\w'basic:4 'u
- struct desc {
- unsigned q6:2, /* least significant ... */
- unsigned q5:2,
- unsigned q4:2,
- unsigned q3:2,
- unsigned q2:2,
- unsigned q1:2, /* to most significant */
- unsigned basic:4; /* basic type */
- };
- .ls
- .DE
- The value of the basic field is the basic type of the symbol;
- the sixteen possible values are:
- .DS B
- .ls 1
- .ta \w'0. 'u +\w'function argument 'u +\w'00. 'u
- 0. undefined \08. struct
- 1. function argument \09. union
- 2. char 10. enum
- 3. short int 11. member of enum
- 4. int 12. unsigned char
- 5. long int 13. unsigned short int
- 6. float 14. unsigned int
- 7. double 15. unsigned long int
- .ls
- .DE
- The
- .I n_desc
- field allows up to six modifiers, in the
- .I q
- fields (\c
- .I q1
- is the most significant and
- .I q6
- the least significant),
- chosen from among:
- .DS B
- .ls 1
- .ta \w'0. 'u +\w'pointer to 'u +\w'0. 'u
- 0. none 2. function returning
- 1. pointer to 3. array of
- .ls
- .DE
- Thus, for example, the variable
- .I xxx
- which is declared as
- .DS C
- char (**(*xxx)())[]
- .DE
- would have the value 0x3592 in its
- .I n_desc
- that is,
- .I q6
- is 0,
- .I q5
- is 3 (array of),
- .I q4
- is 1 (pointer to),
- .I q3
- is 1 (pointer to),
- .I q2
- is 2 (function returning),
- .I q1
- is 1 (pointer to), and
- .I basic
- is 2 (char);
- in English, then,
- .I xxx
- is ``a pointer to a function returning a pointer to a pointer
- to an array of chars.''
- .SH
- Example
- .PP
- In this section, we will examine an object file;
- this will show how the information discussed in the
- preceding sections is actually used.
- The program being discussed is a very simple one;
- given a file name as an argument,
- it determines if the directories on the path are
- searchable and if so, whether the
- named file exists:
- .DS B
- .ls 1
- .ta \w'char **a'u +\w'if (argc'u
- /*
- .ti +\w'/'u
- * program to see if a file can be found
- .ti +\w'/'u
- *
- .ti +\w'/'u
- * only arg is file's path name
- .ti +\w'/'u
- */
- extern int errno;
-
- main(argc, argv)
- int argc;
- char **argv;
- {
- register int i;
- char *oops = "error is";
-
- if (argc != 2){
- printf("%s: too few arguments\n", argv[0]);
- exit(0);
- }
-
- if (access(argv[1], 0) == -1)
- perror(oops);
-
- exit(errno);
- }
- .ls
- .DE
- First, let us look at the symbol and string tables
- of the object file.
- Using
- .I osho (1),
- a program which dumps the parts of an object (or executable)
- file without any interpretation,
- we see that the symbol table is:
- .DS B
- .ls 1
- .ta \w'00: 'u +\w'00, 0x000, 0, 00, 0x000 'u +\w'00: 'u
- \00: 4, 0x64, 0, 0, 0x0 17: 0, 0xc0, 0, 2, 0xc
- \01: 8, 0x20, 0, 4, 0x0 18: 0, 0x44, 0, 16, 0x12
- \02: 14, 0xfe, 1, 0, 0x4 19: 59, 0x1, 0, 0, 0x0
- \03: 20, 0x24, 0, 9, 0x0 20: 0, 0x40, 0, 17, 0x25
- \04: 25, 0x5, 0, 0, 0x0 21: 67, 0x1, 0, 0, 0x0
- \05: 31, 0xa0, 0, 4, 0x4 22: 0, 0x44, 0, 18, 0x2e
- \06: 36, 0xfe, 1, 0, 0x4 23: 0, 0x44, 0, 19, 0x2e
- \07: 41, 0xa0, 0, 82, 0x8 24: 0, 0x44, 0, 20, 0x2e
- \08: 46, 0x2, 0, 0, 0x800 25: 73, 0x1, 0, 0, 0x0
- \09: 0, 0x44, 0, 11, 0x2 26: 0, 0x44, 0, 21, 0x47
- 10: 0, 0x44, 0, 12, 0x4 27: 81, 0x1, 0, 0, 0x0
- 11: 50, 0x40, 0, 4, 0xb 28: 0, 0x44, 0, 22, 0x51
- 12: 52, 0xfe, 1, 0, 0x4 29: 0, 0x44, 0, 23, 0x51
- 13: 0, 0x44, 0, 13, 0x4 30: 89, 0x1, 0, 0, 0x0
- 14: 54, 0x80, 0, 18, 0x4 31: 0, 0x44, 0, 24, 0x5e
- 15: 0, 0x44, 0, 14, 0xc 32: 0, 0xe0, 0, 2, 0x5e
- 16: 0, 0x44, 0, 15, 0xc
- .ls
- .DE
- .PP
- The numbers before the colons indicate the ordering of
- symbol table entries.
- In each quintuplet, the first number is the offset into the
- string table of the name of the variable (see figure 4, above);
- the second (which
- .I osho (1)
- prints in decimal, but which has been translated into hexadecimal here)
- is the type;
- the third,
- .I n_other ;
- the fourth,
- .I n_desc ;
- and the last,
- .I n_value .
- .PP
- In order to interpret this table properly,
- we need to know what the string table looks like;
- a listing of it, in which each entry is preceded
- by its offset in the table, is:
- .DS B
- .ls 1
- .ta \w'00: 'u +\w'errno 'u +\w'00: 'u +\w'argc 'u +\w'00: 'u +\w'_printf 'u +\w'00: 'u
- \04: x.c 31: argc 50: i 67: _exit
- \08: errno 36: argc 52: i 73: _access
- 14: errno 41: argv 54: oops 81: _perror
- 20: main 46: L13 59: _printf 89: _errno
- 25: _main
- .ls
- .DE
- (Recall the first four bytes give the length of the
- table.)
- .PP
- We can now interpret the symbol table. Entry 0, for example, is
- for ``main.c'' (the name in the symbol table with offset 4),
- which, as the field
- .I n_type
- indicates, is a source file name, and the machine code for it
- begins at location 0
- (recall that since the header is not loaded,
- this means the beginning of the text segment.)
- To take a more complicated example, look at entry 11,
- which refers to the symbol ``i''.
- It is a register variable (\c
- .I n_type
- is N_RSYM, or 0x40)
- to which r11 is allocated
- (from the
- .I n_desc
- field, which is hexadecimal
- .I b ,
- or decimal 11,)
- and is declared to be an integer
- (as the
- .I n_desc
- field has value 5, the basic type is
- .I integer ,
- and there are no qualifications.)
- The next entry, number 12, contains a bit more information about ``i'';
- it is four bytes long.
- As a final example, look at entry 28.
- It marks the address (that is, 0x51 in the object file)
- which corresponds to the beginning of a source line
- (namely, line 22 in the source file).
- Since, according to the next entry,
- line 23 also begins at that address,
- line 22 is probably blank.
- Indeed, that is the case.
- .PP
- Now, let's look at the way these symbols interact with the
- text and relocation segments.
- Here is partial output from the
- .SM
- VAX\c
- .LG
- -11/750 disassembler
- .I unas75 (1)
- that corresponds to the text segment;
- it has been edited for the convenience of the reader.
- .DS B
- .ls 1
- .ds Ap \o"' "
- .ta \w#0x00: #u +\w#00 00 00 00' #u +7m +6m +10m
- 0x00: 00 08 _main: .word ^M<r11> ; x.c
- 0x02: 11 5b brb 0x5f ; line 11
- 0x04: de ef 5a\*(Ap00\*(Ap moval 0x64,oops ; lines 12 - 13
- 00\*(Ap00\*(Apad fc
- 0x0c: d1 ac 04 02 cmpl argc,$2 ; lines 14 - 15
- 0x10: 13 1c beql 0x2e
- 0x12: d0 ac 08 50 movl argv,r0 ; line 16
- 0x16: dd 60 pushl (r0)
- 0x18: dd 8f 6d\*(Ap00\*(Ap pushl $109
- 00\*(Ap00\*(Ap
- 0x1e: fb 02 ef db\*(Ap calls $2,_printf
- ff\*(Apff\*(Apff\*(Ap
- 0x25: dd 00 pushl $0 ; line 17
- 0x27: fb 01 ef d2\*(Ap calls $1,_exit
- ff\*(Apff\*(Apff\*(Ap
- 0x2e: dd 00 pushl $0 ; lines 18 - 20
- 0x30: d0 ac 08 50 movl argv,r0
- 0x34: dd a0 04 pushl 4(r0)
- 0x37: fb 02 ef c2\*(Ap calls $2,_access
- ff\*(Apff\*(Apff\*(Ap
- 0x3e: d1 50 8f ff cmpl r0,$-1
- ff ff ff
- 0x45: 12 0a bneq 0x51
- 0x47: dd ad fc pushl oops ; line 21
- 0x4a: fb 01 ef af\*(Ap calls $1,_perror
- ff\*(Apff\*(Apff\*(Ap
- 0x51: dd ef a9\*(Apff\*(Ap pushl _errno ; lines 22 - 23
- ff\*(Apff\*(Ap
- 0x57: fb 01 ef a2\*(Ap calls $1,_exit
- ff\*(Apff\*(Apff\*(Ap
- 0x5e: 04 ret ; line 24
- 0x5f: c2 04 5e subl2 $4,sp
- 0x62: 11 a0 brb 0x4
- .ls
- .DE
- The leftmost column contains the address of the instruction,
- the second column the actual machine code,
- and the rest of the line is the equivalent assemble language code.
- The comments, which begin with `;', reflect the contents of the symbol
- table.
- .PP
- This also shows how relocation data is used.
- (The apostrophes after the bytes in the machine code column indicate
- that that byte is relocatable.)
- The relocation bits for the text segment are:
- .DS B
- .ls 1
- .ta \w'0: 'u +\w'00, 00, 0, 0, 0 'u +\w'0: 'u
- 0: 6, 6, 1, 2, 0 4: 58, 25, 1, 2, 1
- 1: 26, 6, 0, 2, 0 5: 77, 27, 1, 2, 1
- 2: 33, 19, 1, 2, 1 6: 83, 30, 1, 2, 1
- 3: 42, 21, 1, 2, 1 7: 90, 21, 1, 2, 1
- .ls 1
- .DE
- Bearing in mind that the libraries used by the startup routine,
- mainly the standard library, occupies 388 bytes of data space,
- let us calculate the value put in for ``_errno'' in the executable file.
- After creating the executable file, we can examine the header block
- (using either
- .I osho (1)
- or
- .I unas (1)),
- and we see the text segment is 5120 bytes long.
- As there are 388 bytes of data loaded at the head of the data segment
- by the startup routine, then, the final address of the variable ``_errno''
- is 5120 + 388 or 5508 (that is, 0x1584).
- Now, the startup routine is 60 bytes long;
- the value in the word to be relocated is \-87;
- hence, the final value put in that word is
- 5508 \- 60 + (\-87) = 5361, or 0x14f1.
- That's all there is to it.
- .SH
- Conclusion
- .PP
- This note is a collection of information
- about object and
- .I a.out
- files.
- Its purpose is to enable those who need this information
- to find it quickly and easily,
- rather than having to dig through the manual and
- the source code for several assorted programs.
- .PP
- For most applications,
- this document contains far more details than are necessary.
- Indeed, as the meaning of the symbols in the symbol table change to allow
- more sophistication among
- .UX
- debuggers,
- some of the information here may become incomplete or obsolete.
- However, knowledge of the relocatable and executable file
- formats is essential for developing new
- .UX
- tools, new versions of
- .UX ,
- and for porting
- .UX
- to new computers;
- this paper is intended to summarize
- the information scattered throughout other
- .UX
- documents.
- .SH
- Acknowledgements
- .PP
- This paper sprung from work done in the summer of 1982
- at Megatest Corporation.
- Mike DeMoney, Dave Emberson, Gary Fine, Steve Stone, and Mike Yip
- all contributed to this document; my thanks to them all.
- Thanks also to all my co-workers at Megatest;
- without such a pleasant and creative environment, this document
- would never have been written.
- .SH
- Sources
- .PP
- The manual page for
- .I a.out (5)
- contains information on the layout of an executable file;
- .I stabs (5)
- describes the meaning of entries in the symbol table
- when the
- .I g
- flag is given.
- The code for the loader program
- .I ld.c
- and for the
- .I unas (1)
- disassemblers (Version 1.2)
- were invaluable.
- .SH
- Appendix \- Format of a
- .UX
- Version 7
- \kx\fIa.out\fP\h'|\nxu+2u'\fIa.out\fP
- File
- .PP
- This section provides a brief summary of the format
- of a
- .UX
- Version 7
- .I a.out
- file;
- it is intended for those who are
- interested in the format of such
- files on the system from which
- 4.1BSD was developed.
- The main difference, of course, is that Version 7 was designed
- for 16 bit machines in general (and the
- \s-2PDP-11\s0\*(n1 in particular)
- .FS
- \*(n1\s-2PDP\s0 is a Trademark of Digital Equipment Corporation.
- .FE
- .PP
- The header for this version is similar to
- that of the 4.1BSD version;
- the only significant difference is that
- only the existence, and not the size, of the
- relocation segment or segments are recorded.
- The structure for the header is
- .DS B
- .ls 1
- .ta \w'#define'u +\w'unsigned 'u +\w'a_dirsize 'u +4n
- struct exec {
- int a_magic; /* magic number */
- unsigned a_text; /* size of text segment */
- unsigned a_data; /* size of initialized data */
- unsigned a_bss; /* size of uninitialized data */
- unsigned a_syms; /* size of symbol table */
- unsigned a_entry; /* entry point */
- unsigned a_unused; /* not used */
- unsigned a_flag; /* relocation info stripped */
- };
- .ls
- .DE
- Note that each of these fields is 2 bytes long;
- so, the header occupies 16 bytes (and is half as large as
- the 4.1BSD header.)
- The sizes are in bytes, and are even.
- .PP
- The starting address of each segment may be computed from this
- structure, as follows:
- .DS B
- .ls 1
- .ta \w'relocation information 'u +\w'beg'u
- \fIpart of file\fP \fIbegins at location\fP
- text segment 020
- data segment 020 + \fBa_text\fP
- relocation information 020 + \fBa_text\fP + \fBa_data\fP
- \fIif present\fP
- symbol table 020 + \fBa_text\fP + \fBa_data\fP
- \fIwithout relocation information\fP
- symbol table 020 + 2 * (\fBa_text\fP + \fBa_data\fP)
- \fIwith relocation information\fP
- .ls
- .DE
- .PP
- There are four magic numbers;
- each causes the file to be loaded differently.
- .Bs A_MAGIC1 "(value 0407)"
- This is the same as the
- .B OMAGIC
- magic number in 4.1BSD.
- .Es
- .Bs A_MAGIC2 "(value 0410)"
- The data segment in files with this magic number
- begins at the block immediately following the last block
- containing any of the text segment
- (on a \s-2PDP-11\s0, a block is 256 bytes);
- as for
- .B NMAGIC ,
- the text segment is write-protected and shared by
- all processes executing the file.
- .Es
- .Bs A_MAGIC3 "(value 0411)"
- This is just like
- .B A_MAGIC2 ,
- except that the instruction and data spaces are
- separate; both begin at location 0.
- .Es
- .Bs A_MAGIC4 "(value 0405)"
- This is used for overlays;
- the text segment is overlaid on an existing text segment
- (from a file with magic number
- .B A_MAGIC3
- or
- .B A_MAGIC4 )
- while the existing data segment is preserved.
- .Es
- .PP
- As on the
- .SM
- VAX\c
- .LG
- , the stack begins at the highest possible location
- (0177776 on the \s-2PDP\s0)
- and grows down; although it is automatically
- extended as required,
- the data segment is only extended when
- .I sbrk (2)
- is called.
- .PP
- Relocation data amounts to one word per word of text or
- (initialized) data segment.
- The high 12 bits of the word contain the ordinal of the symbol
- in the symbol table (remember, the first symbol is number 0!)
- being relocated;
- the low order bit indicates if the reference is pc-relative;
- and bits 3, 2, and 1 indicate the segment referred to by the relocation word,
- with values
- .DS B
- .ls 1
- .ta \w'000 'u
- 00 absolute
- 01 relative to the text segment
- 02 relative to (initialized) data segment
- 03 relative to bss (uninitialized data) segment
- 04 undefined external symbol
- .ls
- .DE
- .PP
- The symbol table entries are quite different.
- Their structure is
- .DS B
- .ls 1
- .ta \w'#define 'u +\w'unsigned 'u +\w'n_name[8]; 'u
- struct nlist { /* symbol table entry */
- char n_name[8]; /* symbol name */
- int n_type; /* type flag */
- unsigned n_value; /* value */
- }
- .ls
- .DE
- Note that a name may have at most 8 characters;
- no string table is necessary since the name can be stored
- in the entry itself.
- Legal types are
- .DS B
- .ls 1
- .ta \w'N_MMMM 'u +\w'(000) 'u
- \fBN_UNDF\fP (000) not associated with any segment
- \fBN_ABS\fP (001) absolute address (value) given
- \fBN_TEXT\fP (002) located in the text segment
- \fBN_DATA\fP (003) located in the initialized data segment
- \fBN_BSS\fP (004) located in the uninitialized data segment
- \fBN_REG\fP (024) register name
- \fBN_FN\fP (037) names a file
- \fBN_EXT\fP (040) defined externally
- .ls
- .DE
- Note that the external type may be or'ed in with any of the others.
-